Exploring PSI-MI XML Collections Using DescribeX

نویسندگان

  • Reza Samavi
  • Mariano P. Consens
  • Shahan Khatchadourian
  • Thodoros Topaloglou
چکیده

PSI-MI has been endorsed by the protein informatics community as a standard XML data exchange format for protein-protein interaction datasets. While many public databases support the standard, there is a degree of heterogeneity in the way the proposed XML schema is interpreted and instantiated by different data providers. Analysis of schema instantiation in large collections of XML data is a challenging task that is unsupported by existing tools. In this study we use DescribeX, a novel visualization technique of (semi-)structured XML formats, to quantitatively and qualitatively analyze PSI-MI XML collections at the instance level with the goal of gaining insights about schema usage and to study specific questions such as: adequacy of controlled vocabularies, detection of common instance patterns, and evolution of different data collections. Our analysis shows DescribeX enhances understanding the instance-level structure of PSI-MI data sources and is a useful tool for standards designers, software developers, and PSI-MI data providers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DescribeX: A Framework for Exploring and Querying XML Web Collections

DescribeX: A Framework for Exploring and Querying XML Web Collections Flavio Rizzolo Doctor of Philosophy Graduate Department of Computer Science University of Toronto 2008 The nature of semistructured data in web collections is evolving. Even when XML web documents are valid with regard to a schema, the actual structure of such documents exhibits significant variations across collections for s...

متن کامل

Fast Answering of XPath Query Workloads on Web Collections

Several web applications (such as processing RSS feeds or web service messages) rely on XPath-based data manipulation tools. Web developers need to use XPath queries effectively on increasingly larger web collections containing hundreds of thousands of XML documents. Even when tasks only need to deal with a single document at a time, developers benefit from understanding the behaviour of XPath ...

متن کامل

RpsiXML: Application Examples

RpsiXML allows the communication between protein interaction data stored in PSI-MI XML format and the statistical and computational environment of R and Bioconductor. In the vignette RpsiXML, we introduced how to read in PSI-MI XML 2.5 files with RpsiXML. In this vignette, we illustrate the use of RpsiXML with example. These applications demonstrate the power of the package in analyzing protein...

متن کامل

Summary-based Comparison of Data Quality across Public MAGE-ML Genomic Datasets

In this paper we apply techniques based on DescribeX, a summarybased visualization tool for XML, to analyze data quality in MAGE-ML datasets, extending our previous work by comparing different data sources and data quality evolution.

متن کامل

Capturing cooperative interactions with the PSI-MI format

The complex biological processes that control cellular function are mediated by intricate networks of molecular interactions. Accumulating evidence indicates that these interactions are often interdependent, thus acting cooperatively. Cooperative interactions are prevalent in and indispensible for reliable and robust control of cell regulation, as they underlie the conditional decision-making c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Integrative Bioinformatics

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2007